Spaces:

Bhargav17
/

Statistics

Sleeping

App Files Files Community

Statistics / pages /3 Descriptive Statistics.py

Bhargav17

Update pages/3 Descriptive Statistics.py

8b39b1c verified over 1 year ago

raw

history blame contribute delete

11.1 kB

	import streamlit as st
	import pandas as pd
	import numpy as np
	import datasets

	st.title(" DESCRIPTIVE STATISTICS")
	st.header(":blue[Descriptive Statistics:]")
	st.write("The data is summarised and explained in descriptive statistics. The summarization is done from a population sample utilising several factors such as mean and standard deviation. Descriptive statistics is a way of organising, representing, and explaining a set of data using charts, graphs, and summary measures. Histograms, pie charts, bars, and scatter plots are common ways to summarise data and present it in tables or graphs. Descriptive statistics are just that: descriptive.")
	st.subheader(":green[Types of descriptive statistics:]")
	st.write("There are three types of Descriptive statistics.")

	st.write(" 1.Measures of central tendency ")
	st.write(" 2.Measures of Dispersion")
	st.write(" 3.Distribution ")

	st.header(":blue[Measures of Central Tendency :]")
	st.write("The representative value of a data set, generally the central value or the most occurring value that gives a general idea of the whole data set is called Measure of Central Tendency.")
	st.write("Some of the most commonly used measures of central tendency are:")
	st.write("Mean")
	st.write("Median")
	st.write("Mode")

	st.subheader(":green[Mean :]")
	st.write("The mean represents the average value of the dataset. It can be calculated as the sum of all the values in the dataset divided by the number of values.")
	formula = r"""
	mean = \frac{sum of the observation} {total number of observation}
	"""
	st.latex(formula)

	st.write("There are three types of Mean")
	st.write("Arithmetic Mean")
	st.write("Geometric Mean")
	st.write("Harmonic Mean")

	st.write("Arithmetic Mean : - It is the sum of the sampled values divided by the number of items in the sample.")
	st.latex(r"""Arithmetic Mean A.M = \frac{\sum_{i=0}^{n} x{i}}{N}""")
	st.write("where,")
	st.write("x1, x2, x3, . . ., xn are the observations, and")
	st.write("N is the number of observations.")

	st.write("Geometric Mean : - It is an average that is useful for sets of positivenumbers that are interpreted according to their product.")
	st.latex("""Geometric Mean G.M= n\sqrt{X1.X2.X3....Xn}""")
	st.write("where,")
	st.write("x1, x2, x3, . . ., xn are the observations, and")
	st.write("n is the number of observations.")


	st.write("Harmonic Mean : - The harmonic mean is a numerical average calculated by dividing the number of observations by the reciprocal of each number.")
	st.image("hormonic mean.jpg")

	st.subheader(":green[Median :]")
	st.write("Median is the middle value of the dataset in which the dataset is arranged in the ascending order or in descending order. When the dataset contains an even number of values, then the median value of the dataset can be found by taking the mean of the middle two values. Consider the given dataset with the odd number of observations arranged in descending order – 23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and 2")
	st.image("median odd.jpg")
	st.write("Here 12 is the middle or median number that has 6 values above it and 6 values below it")
	st.write("Now, consider another example with an even number of observations that are arranged in descending order – 40, 38, 35, 33, 32, 30, 29, 27, 26, 24, 23, 22, 19, and 17")
	st.image("median even.jpg")
	st.write("When you look at the given dataset, the two middle values obtained are 27 and 29. Now, find out the mean value for these two numbers. ")
	st.write("i.e., (27+29)/2 =28")
	st.write("Therefore, the median for the given data distribution is 28.")

	st.subheader(":green[Mode :]")
	st.write("The mode represents the frequently occurring value in the dataset. Sometimes the dataset may contain multiple modes and, in some cases, it does not contain any mode at all.")
	st.write("Consider the given dataset 5, 4, 2, 3, 2, 1, 5, 4, 5")
	st.image("mode.jpg")
	st.write("Since the mode represents the most common value. Hence, the most frequently repeated value in the given dataset is 5.")

	st.write("Types of Mode :")
	st.write("Unimodal")
	st.write("Biomodal Mode")
	st.write("Trimodal")
	st.write("Multimodal")

	st.subheader(":green[Unimodal :]")
	st.write("Unimodal Mode - A set of data with one Mode is known as a Unimodal Mode. this means there is only one value repeating occure more times. ")
	st.write("Example, The Mode of data set A = {14, 15, 16, 17, 15, 18, 15, 19} is 15 as there is only one value repeating occure more times.")
	st.write("Example,The given set of data: 2, 4, 5, 5, 6, 7, the mode of the data set is 5 since it has appeared in the set twice.")

	st.subheader(":green[Biomodal Mode :]")
	st.write("Bimodal Mode - When there are two modes in a data set, then the set is called bimodal. This means that there are two data values that are having the highest frequencies.")
	st.write("Example, The Mode of data set A = { 8,13,13,14,15,17,17,19} is 13 and 17 because both 13 and 17 are repeating twice in the given set")
	st.write("Example, The mode of Set A = {2,2,2,3,4,4,5,5,5} is 2 and 5, because both 2 and 5 is repeated three times in the given set.")

	st.subheader(":green[Trimodal :]")
	st.write("Trimodal Mode - When there are three modes in a data set, then the set is called trimodal. This means that there are three data values that are having the highest frequencies.")
	st.write("Example, The Mode of data set A = {100, 80, 80, 95, 95, 100, 90, 90,100 ,95 } is 80, 90, 95, and 100 because both all the four values are repeated twice in the given set.")
	st.write("Example, The mode of set A = {2,2,2,3,4,4,5,5,5,7,8,8,8} is 2, 5 and 8 is repeated three times in the given set.")

	st.subheader(":green[Multimodal :]")
	st.write("Multimodal Mode - When there are four or more modes in a data set, then the set is called multimodal")
	st.write("Example, The Mode of data set A = {100, 80, 80, 95, 95, 100, 90, 90,100 ,95 } is 80, 90, 95, and 100 because both all the four values are repeated twice in the given set.")

	st.header(":blue[Measures Of Dispersion :]")
	st.write("Dispersion is the state of getting dispersed or spread. Statistical dispersion means the extent to which numerical data is likely to vary about an average value. In other words, dispersion helps to understand the distribution of the data.")
	st.subheader(":green[Measures Of Dispersion in Statistics :]")
	st.write("Measures of dispersion help to describe the variability in data. Dispersion is a statistical term that can be used to describe the extent to which data is scattered. Thus, measures of dispersion are certain types of measures that are used to quantify the dispersion of data.")
	st.subheader(":green[Measures Of Dispersion Defination :]")
	st.write("Measures of dispersion can be defined as positive real numbers that measure how homogeneous or heterogeneous the given data is. The value of a measure of dispersion will be 0 if the data points in a data set are the same. However, as the variability of the data increases the value of the measures of dispersion also increases.")
	st.image("Measure of dis.jpg")

	st.write("There are two types of dispersion :")
	st.write("1.Absolute Measure of dispersion")
	st.write("2.Relative Measure of dispersion")

	st.subheader(":green[1.Absolute Measure of dispersion :]")
	st.write("Absolute - the measures of dispersion that are measured and expressed in the units of data themselves are called Absolute Measure of Dispersion. For example – Meters, Dollars, Kg, etc.")

	st.write("Absolute measures of dispersion are:")
	st.subheader(":green[Range]")
	st.write("Range - Given a data set, the range can be defined as the difference between the maximum value and the minimum value.")
	st.image("range.png")

	st.subheader(":green[Mean Deviation :]")
	st.write("Mean Deviation - It is the arithmetic mean of the difference between the values and their mean.")
	st.image("devistion.png")

	st.subheader(":green[Standard Deviation :]")
	st.write("Standard Deviation - It is the square root of the arithmetic average of the square of the deviations measured from the mean.")
	st.image("std.png")

	st.subheader(":green[Variance :]")
	st.write("Variance - It is defined as the average of the square deviation from the mean of the given data set.")
	st.image("variance.jpg")

	st.subheader(":green[Quartile Deviation]")
	st.write("Quartile Deviation - It is defined as half of the difference between the third quartile and the first quartile in a given data set.")
	st.image("https://miro.medium.com/v2/resize:fit:1024/1*1br-28_d07Ur3cXTyKEJjw.jpeg",caption="Quartile Deviation")

	st.subheader(":green[Interquartile Range]")
	st.write("Interquartile Range - The difference between upper(Q3 ) and lower(Q1) quartile is called Interterquartile Range. Its formula is given as Q3 – Q1.")
	st.image("https://statsmethods.wordpress.com/wp-content/uploads/2013/05/capture.png",caption="Interquartile Range")

	st.write("Relative - We use relative measures of dispersion to measure the two quantities that have different units to get a better idea about the scattering of the data.")

	st.subheader(":green[Relative measures of dispersion:]")
	st.write("Coefficient of Range - It is defined as the ratio of the difference between the highest and lowest value in a data set to the sum of the highest and lowest value.")
	st.image("https://bbamantra.com/wp-content/uploads/2016/09/range-coefficient-of-range.jpg")

	st.subheader(":green[Coefficient of Variation :]")
	st.write("Coefficient of Variation - It is defined as the ratio of the standard deviation to the mean of the data set. We use percentages to express the coefficient of variation.")
	st.image("https://study.com/cimages/multimages/16/dcee854a-311f-4249-8800-d6ea1a117b398244206355149698335.png")

	st.subheader(":green[Coefficient of Mean Deviation :]")
	st.write("Coefficient of Mean Deviation- It is defined as the ratio of the mean deviation to the value of the central point of the data set.")
	st.image("https://www.zigya.com/application/zrc/images/qvar/STEN11019437-1.png")

	st.subheader(":green[Coefficient of Quartile Deviation :]")
	st.write("Coefficient of Quartile Deviation - It is defined as the ratio of the difference between the third quartile and the first quartile to the sum of the third and first quartiles.")
	st.image("https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWzFSpCpnwz2B11b8ec6BM9YNesfkAKsXuGA3L-Rwl4Vt31dTBCZ99fL9DDsKbvHPWh4wQ7H0caBQLxOjYSA3wWoskm8ROF9JFh4DG7vuUe5kvtxn6Lv4f4B2_phDjoUvi-VSlIQz9TC8r/s16000/coefficient+of+quartile+deviation.png")

	st.subheader(":green[Distribution Measures :]")
	st.write("The distribution concerns the frequency of each value.")

	st.header(":blue[Frequency Distribution :]")
	st.write("Frequency Distribution - A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarize the frequency of every possible value of a variable in numbers or percentages. This is called a frequency distribution.")
	st.image("https://thirdspacelearning.com/wp-content/uploads/2023/08/Frequency-Distribution-us-featured-image.png")